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Abstract 

Tree  rewrite  systems  with  typed  variables  can  be  used  to  repre¬ 
sent  many  tree  transformations  in  a  more  compact  form  than  systems 
with  untyped  variables.  By  building  on  the  work  of  Eduardo  Pelegri- 
Llopart,  it  is  possible  to  generate  linear-time  optimal  solutions  to 
SET-REACHABILITY  —  a  generalization  of  REACHABILITY  with 
a  possibly  infinite  goal  set  —  for  a  useful  class  of  typed  rewrite  sys¬ 
tems.  The  algorithms  developed  can  also  handle  some  untyped  sys¬ 
tems  that  are  not  in  BURS,  such  as  systems  with  rules  of  the  form 
X  — >  a( A  ). 

An  experiment  involving  a  rewrite  system  for  instruction  selection 
for  the  Motorola  68000  produced  table  sizes  an  order  of  magnitude 
larger  than  those  produced  by  an  untyped  rewrite  system  for  the  same 
task.  It  is  not  clear  whether  this  table  size  can  be  limited,  or  if  it  is  an 
inherent  cost  of  the  power  given  by  types.  Although  discouraging  for 
the  instruction  selection  application,  the  table  sizes  are  small  enough 
(under  100k  bytes)  that  the  techniques  may  be  useful  for  smaller  ap¬ 
plications,  or  in  cases  where  the  added  expressibility  of  typed  variables 
outweighs  the  size  explosion  of  the  tables. 
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1  Introduction 

A  tree  transformation  system  is  a.  relation  between  trees  in  an  input  lan¬ 
guage  and  an  output  language.  Many  problems  in  compiler  writing  are 
naturally  modeledusing  tree  transformation  systems:  e.g.,  the  input  and 
outputlanguages  may  be  two  different  intermediate  representations,  with 
thetree  transformation  system  defining  the  possible  implementations  of 
theinput  language  in  terms  of  the  output  language.  Usually  the  tree  trans¬ 
formation  svstem  is  described  in  some  compact  form,  and  the  problem  is 
to  find  some  (possibly  optimal)  output  tree  that  is  related  to  a  given  input 
tree. 

Eduardo  Pelegri-Llopart’s  PhD  dissertation  [2]  focuses  on  tree  rewrite 
systems.  A  rewrite  system  is  a  collection  of  functions,  called  rules,  from 
trees  to  trees.  An  individual  rule  contains  an  input  pattern  used  to  specif} 
a  set  of  input  trees  —  those  trees  which  the  pattern  matches  —  to  which 
the  rule  applies,  and  an  output  pattern  which  constructs  an  output  tree, 
possibly  using  pieces  of  the  input  tree.  A  tree  ti  can  be  rewritten  into  a  tree 
tn  if  there  is  a  sequence  of  trees  t\, . . .  ,tn  where  <,+ 1  is  obtained  from  t,  by 
replacing  some  subtree  with  its  image  under  a  rule  in  the  rewrite  s}  stem. 
A  tree  transformation  system  can  be  obtained  from  a  re-write  system  by 
defining  two  trees  to  be  related  if  one  can  be  rewritten  into  the  other. 

Given  a  rewrite  system  R.  a  goal  tree  g,  and  an  input  tree  t,  the 
REACHABILITY  problem  is  to  find  a  sequence  of  rewrites  from  t  to  g, 
or  show  that  there  is  no  such  sequence.  Pelegri  showed  howr  to  solve  the 
REACHABILITY  problem  efficiently  in  the  case  that  R  and  g  are  fixed  and 
only  t  varies,  for  rewrite  systems  in  the  class  BURS.  His  solution  yields 
optimal  rewrite  sequences,  given  a  linear  cost  function  on  the  rewrite  se¬ 
quences. 

In  a  REACHABILITY  problem,  the  main  item  of  interest  is  the  rewrite 
sequence  from  the  tree  to  the  goal;  some  meaning  outside  of  the  rewrite 
svstem  is  attached  to  individual  rules,  and  is  used  to  solve  the  particular 
problem  (e.g.,  code  generation)  at  hand.  Often  the  original  problem  calls 
for  a  tree  to  be  generated.  If  this  is  the  case,  the  SET-REACHABILETY 
problem  may  be  more  useful:  given  a  rewrite  system  R.  a  set  of  goal  trees 
G,  and  an  input  tree  f,  find  a  sequence  of  rewrites  from  t  to  any  tree  in  G, 
or  show  that  there  is  no  such  sequence. 
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This  report  describes  CRSTNA1,  a  system  that  produces  an  efficient 
solver  of  SET-REACHABILITY  given  an  R  and  G  satisfying  certain  restric¬ 
tions.  CRSTNA  extends  Pelegri's  work  by 

•  solving  SET-REACHABILITY,  a  generalization  of  REACHABILITY, 

•  extending  the  power  of  rewrite  rules  by  allowing  a  more  powerful  type 
of  input  pattern,  and 

•  accepting  a  wider  variety  of  rewrite  systems. 

Familiarity  with  Pelegri’s  dissertation  is  not  required  to  use  CRSTNA 
or  to  verify  the  theory  in  this  report,  but  it  covers  the  uses  and  motivation 
behind  rewrite  systems  in  much  greater  detail.  This  report  assumes  that 
readers  are  familiar  with  his  work,  or  with  other  recent  work  in  rewrite 
systems.  In  particular,  this  report  stops  short  of  providing  a  complete 
description  of  the  theory  and  implementation,  since  the  final  parts  of  the 
solution  are  identical  to  Pelegri’s. 

2  A  User’s  View 

In  this  section,  we  informally  describe  the  kinds  of  rewrite  systems  CRSTNA 
is  intended  to  process,  and  consider  the  limitations  imposed  by  the  algo¬ 
rithms  used  in  CRSTNA. 

2.1  Specifying  trees,  patterns,  and  rewrite  rules 

Trees  in  CRSTNA  are  rooted,  ordered  acyclic  graphs  labeled  with  operators. 
Each  operator  has  an  associated  integer  called  its  arity,  a  node  labeled 
with  the  operator  a  must  have  as  many  children  as  the  arity  of  a.  We 
use  the  notation  a[ci, . . .  ,c„]  to  denote  the  tree  with  the  root  labeled  by  a 
with  children  C\  through  c„.  For  example,  the  tree  +[reS'i[j,  rel?2D] 
represent  a  machine  add  instruction. 

A  pattern  is  a  tree  in  which  some  of  the  operators  are  variables.  CRSTNA 
onlv  allows  patterns  in  which  all  of  the  variables  have  zero  arity,  and  each 
variable  occurs  at  most  once.2  A  variable  has  associated  with  it  a  set  of 
trees  called  its  type.  A  pattern  g  matches  a  tree  t  if 

1  Compositional  Rewrite  System  Tool  with  Natural  Automata. 

2 linear  n-patterns,  in  Pelegri’s  terminology. 
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•  o  and  t  have  the  same  label  at  the  root,  and  each  child  of  g's  root 
matches  the  corresponding  child  of  t,  or 

•  o  is  labeled  by  a  variable  at  the  root,  and  t  is  a  member  of  the  type 
of  the  variable. 

Variables  are  denoted  with  upper-case  letters;  their  type  may  be  specified 
by  following  the  variable  name  with  a  colon  and  the  name  of  the  type. 
Variables  with  no  specified  type  have  the  universal  type  (the  set  of  all  trees). 
Thus,  the  pattern  X  matches  all  trees;  +[A,  re<?2[]j  matches  re<?2[]] 

and  +  [+[regx[],  reg2[]],  reg2],  among  others.  If  f  is  the  type  {Teg1,  reg2} ,  then 
addl  [X  :  f[j]  matches  addl  [re^J]]  but  not  addl  [re^3[]j. 

A  rewrite  rule  is  specified  with  an  input  pat  tern  and  an  output  pattern, 
and  is  written  <a — ►  /?,  where  a  is  the  input  pattern  and  (3  the  output  pattern. 
Each  variable  in  the  output  pattern  must  occur  exactly  once  in  the  input 
pattern,  a  — >  (3  specifies  the  function  r  defined  as  follows: 

•  If  a  does  not  match  f,  r(t)  is  undefined. 

•  If  a  matches  t ,  r(t)  is  (3  with  variables  in  (3  replaced  by  sub-trees  in 
t  that  are  matched  by  the  corresponding  variables  in  a. 


For  example, 

+[Al],r[]]-+[r  [],*[]] 

maps  the  tree  +[lD,  into  -b[,re^1  [],  1[]],  and 

+{x[],i[})  ^  addi  pro] 


maps  +[re^1[],l[]]  into  addl  [re^J]]. 

The  types  used  in  a  rewrite  system  are  specified  with  a  set  of  type 
instances.  A  type  instance  is  a  statement  of  the  form 


g  is-a  t, 

where  g  is  a  pattern  and  t  is  a  type  name;  it  states  that  any  tree  matchin 
g  is  an  element  of  the  type  named  by  t.  This  is  a  recursive  definition; 
is  allowed  to  have  variables  with  t  as  their  type,  or  with  types  specified  in 
terms  of  t.  For  example,  given  the  set  of  type  instances 


true[ ]  is-a  t 

not[X  :  <[]]  is-a  / 

not[X  :  /[]]  is-a  f, 
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SO  Qj 


f  is  the  set  of  trees  with  an  odd  number  of  nodes  labeled  not  and  with  one 
leaf,  labeled  true. 

CRSTNA  requires  a  set  of  type  instances,  a  set  of  rewrite  rules,  and  a 
goal  type;  it  then  produces  an  efficient  algorithm  that,  given  an  input  tree, 
finds  a  sequence  of  rewrites  from  the  input  tree  to  a  goal  tree.  If  costs  are 
associated  wdth  rewrite  rules,  CRSTNA  will  find  a  rewrite  sequence  with 
minimum  cost  (where  the  cost  of  a  sequence  is  the  sum  of  the  costs  of  the 
individual  rules).  As  a  final  example,  the  following  specification  might  be 
used  to  remove  additions  of  zero,  and  replace  additions  of  one  by  increment 
operators,  in  simple  expression  trees: 
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ident[] 

+  [A  •  g[],r  ■■  3 [11 
addl  [X  :  p[j] 


is  the  goal  type 
is-a  g 
is-a  g 
is  a  g 


+[A|],1'D]  -  +[F[],A[]] 
+[X[],1[]]  -  Mil  AD! 
+[A(),O01  -  AO 


2.2  Restrictions  on  types 


Sets  of  type  instances  are  equivalent  in  pow7er  to  labeled  bottom-up  finite 
state  automata.  The  types  that  can  be  specified  are  precisely  the  recogniz¬ 
able  tree  languages.3  In  order  for  CRSTNA  to  solve  SET-REACHABILIT'V , 
the  tvpes  must  be  closed  under  the  rewrite  system,  i.e.,  if  T  is  a  type  used 
in  the  system,  then  for  all  t  in  T,  if  t  can  be  rewritten  into  some  tree  t\  t' 
must  be  in  T. 

This  might  seem  to  be  a  severe  restriction  on  the  powrer  of  CRSTNA,  but 
it  is  perfectly  acceptable  for  applications  in  w'hich  the  types  of  a  tree  (i.e., 
those  types  of  which  it  is  a  member)  say  something  about  the  semantics  of 
a  tree.  In  this  case,  enforcing  the  closure  of  the  type  system  corresponds 
to  insisting  that  a  rewrite  rule  can  only  enrich,  not  destroy,  the  semantic 
information  corresponding  to  a  tree. 

3Bottom-up  finite  state  automata,  or  BFSAs,  are  a  natural  generalization  of  string 
automata  to  trees;  the  class  of  tree  languages  that  can  be  recognized  by  BFSAs  are  called 
recognizable,  as  are  the  string  languages  that  can  be  recognized  by  string  automata. 
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2.3  Restrictions  on  rewrite  rules 

CRSTNA  generates  linear  time  solvers;  they  compute  all  of  the  information 
needed  to  solve  SET-REACHABILITY  in  a  single  bottom-up  pass  oyer  the 
input  tree.  This  strongly  limits  the  rewrite  systems  which  CRSTNA  can 
handle.  There  is  no  simple  characterization  of  rules  which  are  or  are  not 
acceptable;  the  interaction  between  different  rules  is  of  major  importance. 
In  practice,  we  have  found  two  general  problems  that  can  occur: 

1.  There  exists  a  sequence  of  rules  which  can  pass  information  arbitrarily 
far  down  the  tree.  For  example,  the  rule  a[f>[A'[]]]  — ►  a[a[A  []]],  applied 
successively  to  a  tree  of  the  form  a[i>[6[. . .]]]  passes  the  information 
that  there  is  an  a  at  the  root  arbitrarily  far  downward. 

This  will  always  be  a  problem  in  any  system  which  attempts  to  track 
all  interesting  rewrites  in  a  single  bottom-up  pass  over  the  tree,  since 
it  is  impossible  to  know  if  a  particular  rewrite  is  possible  without 
information  arbitrarily  high  in  the  tree. 

2.  There  exists  a  sequence  of  rewrites  which  can  be  reapplied  at  the  root 
(but  not  at  subtrees)  an  unbounded  number  of  times,  depending  on 
the  size  of  the  tree.  This  requires  keeping  track  of  an  unbounded 
amount  of  information  while  traversing  the  tree,  and  thus  will  also 
always  be  a  problem  for  any  solution  which  does  its  computation  in 
a  single  bottom-up  traversal. 

These  two  problems  can  be  formalized,  and  I  believe  they  form  an  ex¬ 
haustive  list;  a  future  goal  is  to  prove  CRSTNA  can  handle  any  system 
that  has  neither  of  these  problems. 

3  Implementation 

We  here  describe  the  theory  behind  CRSTNA,  and  a  few  of  the  details  of 
the  implementation. 

3.1  Typed  rewrite  systems 

We  make  use  of  the  following  notations.  The  domain  of  a  function  /  is 
written  V(f).  A  sequence  with  elements  ex  through  en  is  written  el  •  •  •  e„; 
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a  singleton  sequence  is  often  denoted  by  its  single  element  when  it  is  clear 
by  context  that  a  sequence  is  required.  The  concatenation  of  two  sequences 
si  and  s2  is  written  either  as  S]  s2  or  as  si//s2.  The  length  of  a  sequence  s  is 
denoted  length(s).  sl  ■  ■  ■  sr  ■  ■  *„  denotes  the  sequence  ■  s,_i//s,+i 
The  head  of  a  sequence  is  the  first  element;  the  tail  is  the  sequence  with  the 
first  element  removed.  Tuples  (sequences  with  fixed  numbers  of  elements) 
are  enclosed  in  angle  brackets,  with  elements  separated  by  commas.  Func¬ 
tions  are  often  described  with  “defining  equations  :  =  reads  is  defined  to 
be”,  and  x  =  y  means  that  if  y  is  defined,  then  x  is  defined  to  be  y. 

The  trees  used  in  CR.STNA  are  rooted,  ordered,  and  labeled,  where  the 
label  at  a  node  determines  the  aritv  of  that  node.  Formally,  we  will  define 
a  tree  to  be  a  mapping  from  positions  in  the  tree  to  operators;  positions 
will  be  described  by  sequences  of  integers,  with  the  empty  sequence  corre¬ 
sponding  to  the  root  of  the  tree  and  the  sequence  p//i  corresponding  to  the 
Fth  child  of  the  node  at  position  p.  tap  is  the  subtree  of  t  at  position  p, 
and  tt?t'  is  the  tree  formed  by  replacing  the  subtree  of  t  at  p  with  t'. 

Definition  1  An  operator  set  O  is  a  pair  ( 0,N )  where  O  is  any  set  and 
N  is  a  mapping  from  O  to  the  non-negative  integers.  The  members  ofO  are 
called  operators.  The  arity  of  an  operator  o  is  N(o).  An  operator  with 
arity  n  is  called  an  n-ary  operator;  an  operator  with  arity  zero  is  called  a 
nuiiary  operator. 

A  position  is  a  sequence  of  positive  integers.  V  denotes  the  set  of  all 
positions. 

A  tree  shape  is  a  set  P  of  positions  where,  for  all  positions  p  and  q. 
pjjq  g  p  implies  p  G  P,  and  for  all  positions  p  and  integers  i  >  1,  p//i  6  P 
implies  p//(i  —  1)  €  P. 

A  tree  t  over  an  operator  set  O  =  (0,  AT)  is  a  mapping  from  a  tree 
shape  P  to  O  where,  for  all  p  e  P  and  integers  i,  pl/i  €  P  iff  0  <  £  < 
N(t(p)).  To  denotes  the  set  of  trees  over  O. 

For  any  function  f  with  X?(/)  C  P,  f®P  IS  the  function 

(f®p)(q)  =  f(pffq)- 

f  g  is  the  function 

( f  @P  w  x  A  /  g(s)  if  Bs  3:q  =  p//s, 

~  \  /(g)  otherwise. 
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We  sometimes  use  the  notation  a[ti, . . . ,  tn]  to  denote  the  tree  t  ■with  t(e) 
a  and  tffii  =  t{. 

In  Pelegri’s  thesis,  and  in  earlier  sections,  patterns  may  have  variables  as 
operators.  A  variable  can  be  formally  defined  as  a  pair  of  a  name  and  a  t}  pe. 
In  linear  patterns  (where  any  given  variable  appears  at  most  once),  only  the 
type  of  the  variable  is  important  in  determining  which  trees  are  matched 
by  the  pattern;  thus,  it  is  possible  to  have  several  different  patterns,  all  of 
which  are  equivalent  in  terms  of  matching  trees.  This  causes  many  theorems 
to  be  more  difficult  to  state  and  prove  than  is  inherently  necessarv 

We  therefore  use  wildcards  instead  of  variables  in  our  formal  w  ork.  A 
wildcard  is  like  an  anonymous  variable;  it  consists  of  simply  a  type: 

Definition  2  A  pattern  over  an  operator  set  O  =  (0,N)  is  a  tree 
over  the  operator  set  ( 0  U  2J°,N'),  where 


N'(x)  ± 


N(x)  if  X  e  o, 

0  ifxE  2T°. 


A  member  of  2T°  is  called  a  wildcard. 

W(fl)  =  {9  |  g(q)  E  2Tc }  is  the  set  of  wildcard  positions  of  the  pattern 

g.  . 

A  pattern  g  matches  a  tree  i  if,  for  all  p  E  T>( g ),  g{p)  E  O  implies 

t(p)  =  g(p)  and  g(p)  E  2T°  imphes  t®p  E  g(p)- 

For  a  pattern  g,  L(g)  is  the  set  {  t  |  g  matches  t  } . 

Two  patterns  gi  and  g?  are  equivalent,  denoted  g\  =  g2,  if  L(qi)  — 

L(g2). 

Our  definition  allows  the  wildcards  to  be  arbitrary  sets  of  trees.  This 
general  definition  creates  problems  for  practical  implementations,  not  the 
least  of  which  is  specification  of  the  wildcards.  We  will  constrain  wild¬ 
cards  by  associating  them  with  individual  states  of  a  botiom-up  finite  state 
automata  (BFSA): 

Definition  3  Given  an  operator  set  { 0,N ),  a  deterministic  bottom-up 
finite  state  automata,  or  BFSA,  is  a  pair  ( S,6 )  where  S  is  a  finite  set  of 
states  and  5,  the  transition  function,  is  a  function  6:0  x  S’  —>  S  where 
6{a,  s)  is  undefined  for  all  s  with  length(s)  #  N(a). 

Given  an  operator  set  ( O ,  A ),  a  tree  t  and  a  BFSA  A  =  (5, 6),  the  state 
assigned  to  t  by  A,  denoted  by  A(t),  is  6(t(e),A(t®  1)  •  •  •  A(f®A  (f(e)))). 
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For  a  state  s,  L(s)  is  the  set  { t  j  A(t)  —  s  }. 

.4  state  s  is  useful  if  L(s)  is  non-empty.  We  liencefonvard  assume  that 
every  state  of  A  is  useful. 

Given  a  BFSA  A,  a  wildcard  W  is  single-state  recognizable  (SSR)  if 
there  exists  a  state  s  in  SA  with  W  =  L(s).  RECOG  is  the  set  of  wildcards 
that  are  SSR  by  some  BFSA. 

Given  a  BFSA  A ,  a  pattern  o  is  wildcard-SSR  if,  for  each  wildcard 
position  w  of  o,  o(ic)  is  SSR. 

If  q  is  wildcard-SSR  by  A,  the  state  assigned  to  o  by  A  is  A(o)  where 

.  a  (  8(g{s),A(g@l)---A(g®N(g(s))))  if  g(e)  €  O. 

A^  =  \s  if  q(-)  =  L(s). 

Patterns  that  are  wildcard-SSR  by  a  BFSA  have  two  main  advantages. 
First  of  all,  whether  or  not  a  tree  is  matched  by  such  a  pattern  can  be  com¬ 
puted  in  a  single  bottom-up  traversal  of  the  tree;  P elegri  showed  that  this 
is  true  for  any  pattern  where  the  wildcards  are  members  of  RECOG.  Sec¬ 
ondly,  such  patterns  can  easily  be  placed  in  a  normal  form.  The  use  of  wild¬ 
cards  instead  of  variables  is  motivated  by  the  desire  that  equivalent  patterns 
be  equal.  This  is  not  the  case  for  arbitrary  wildcard-SSR  patterns;  for  ex¬ 
ample,  if  qx  =  a[£(si)[]]  and  o2  =  a[6[L(s2)[]]]  where  L(sx)  =  L  (b[L(s2){}}) , 
then  Qx  and  o2  are  equivalent  but  not  equal.  L(sx )  =  L  (6[L(.S2)[]])  implies 
(6,  s2)  is  the  only  pair  with  <5(6 , 52)  =  ■Si,  i.e.,  there  is  only  one  transition  in 
the  automaton  leading  to  the  state  «i.  This  suggests  the  following  definition 
of  a  normal  form  for  patterns: 

Definition  4  Given  a  BFSA  A  with  transition  function  6.  a  transition  is 
a  pattern  a[L(Sl 1(00]  where  S(a,Sl---sn)  is  defined.  The  transi¬ 
tion  a  leads  to  the  state  A(a). 

A  pattern  o  is  in  A-normal  form  if  it  is  wildcard-SSR  by  A  and,  for 
all  wildcard  positions  p  of  g,  there  are  two  different  transitions  cx  i  and  o2 
leading  to  A(o®p). 

Proposition  1  If  gx  and  o2  are  in  A-normal  form,  then  Qx  =  Qi  iff  Qx  =  22- 

Proof  If  the  patterns  are  equal,  they  are  equivalent. 

If  they  are  equivalent,  we  will  prove  they  are  equal  by  induction  on  the 
height  of  the  patterns.  Suppose  that  the  children  of  Qx  and  r>2  are  equal 
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(this  is  vacuously  true  for  the  base  case).  If  £>i(£)  €  O  and  g2{e)  €  O,  then 
clearly  the  operators  must  be  the  same  for  the  patterns  to  be  equivalent, 
so  the  patterns  are  equal.  If  £>i(e)  and  02(f)  are  both  wildcards,  the\  must 
correspond  to  the  same  state  in  A  in  order  to  match  the  same  trees,  and  so 
the  patterns  are  equal.  Otherwise,  suppose  WLOG  that  f>i(e)  is  a  wildcard 
and  £>2(0  €  O. 

Since  Q\  is  in  .A-normal  form,  there  must  be  two  different  transitions  o 
and  g'  leading  to  «4(£>i).  It  can  be  showTn  that  L(o )  U  L(o  )  C  L{ox)  and 
that  L(g)  U  Lie')  2  L{e2)\  but  this  implies  ox  ^  £>2,  a  contradiction.  □ 

Proposition  2  If  g  is  wildcasd-SSR  by  A,  there  exists  a  o'  =  g  in  A-normal 
form. 

Proof  Suppose  g(e)  is  a  wildcard.  We  will  proceed  by  induction  on  the 
height  of  a  minimal  height  tree  in  L{g).  If  there  are  at  least  tw’o  transitions 
leading  to  g,  then  g  is  in  .4-normal  form  by  definition.  Otherwise,  g  = 
a[L(si)[], . . .  ,L(s„)[]j  for  some  operator  a  and  wildcards  L(s,),  each  of  which 
have  equivalent  .A-normal  forms  by  the  induction  hypothesis,  therefore,  g 
has  an  equivalent  .4-normal  form. 

If  g{z)  is  not  a  wildcard,  then  replacing  each  of  its  wildcards  with  an 
equivalent  pattern  in  .4-normal  form  yields  an  equivalent  pattern  in  A- 
normal  form.  D 

Given  patterns,  we  can  now'  define  rewrite  rules.  A  rewrite  rule  is  a 
partial  function  mapping  trees  to  trees.  The  domain  of  the  rule  is  specified 
bv  an  input  pattern,  and  the  range  by  an  output  pattern ;  the  function  itself 
is  specified  by  relating  wildcard  positions  in  the  output  pattern  to  wildcard 
positions  in  the  input  pattern: 

Definition  5  A  rewrite  rule  r  is  of  the  form  a  —>  3  where  a  and  {3 
are  patterns  called  the  input  pattern  and  output  pattern ,  respectively, 
and  w  is  a  1-1  function  w:W{0)  ->  W(a)  where  a(u--(p))  =  P{p)  for  all 

p  eW(/9). 

A  rewrite  system  is  a  collection  of  rewrite  rules. 

A  rewrite  rule  is  in  A-normal  form  if  both  the  input  pattern  and  the 
output  pattern  are  in  A-normal  form.  A  rewrite  system  is  in  A-normal 
form  if  every  rewrite  rule  is  in  A-normal  form. 
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The  rewrite  rule  r  —  ct  — +  /3  is  applicable  to  t  if  o  matches  t.  If  r  is 
applicable  to  t,  r(t),  the  application  of  r  to  t,  is  the  tree 

where  W(/3)  =  {pi,  ,Pn] ■ 

Two  rules  rl  and  r2  are  equivalent,  written  r1  =  r2,  if,  for  all  trees  t, 
r1(t)=t'  iffr2(t)=f. 

Proposition  3  If  r 2  and  r2  are  both  rules  in  A-normal  form,  then  rx=r2 
iff  fj  =  T 2 . 

Proof  If  the  rules  are  equal,  they  are  equivalent. 

If  the  rules  are  equivalent,  then  they  have  equal  domains;  this  implies 
that  their  input  patterns  are  equivalent,  and  since  they  are  in  ,4-normal 
form,  they  must  be  equal.  Likewise,  they  have  equal  ranges,  and  therefore 
their  output  patterns  are  equal.  Suppose  w2  ^  w2;  then  there  is  some 
p  €  W(/3)  with  W\(p)  7^  u?2(p).  Since  3\  is  in  ,4-normal  form,  there  are  two 
different  transitions  g  and  o'  leading  to  «4(/3i@p);  this  implies  that  there  are 
twro  different  trees  t\  and  t2  matching  3i ®P-  Let  t  be  a  tree  matching  Qi 
wdth  Mwi(p)  =  t\  and  t®w2(p)  =  t2.  It  is  easy  to  verify  that  such  a  tree 
exists  and  that  n(<)  #  r2(*),  a  contradiction.  D 

Due  in  part  to  this  nice  property,  we  will  from  now-  on  assume  that  all 
rewrite  systems  are  in  ,4-normal  form.  In  section  3.2,  w'e  will  show'  that 
these  systems  are  equivalent  in  powrer  to  systems  in  which  wildcards  are 
allowed  to  be  any  member  of  RECOG. 

We  are  now  ready  to  define  SET-REACH  ABILITY : 

Definition  6  A  rewrite  application  is  a  pair  ( r,p )  of  a  rule  r  and  a 
position  p.  (r,p)  is  applicable  to  t  if  r  is  applicable  to  <«p.  If  ( r,p )  is 
applicable  to  t,  then  { r,p)(i ),  the  application  of  ( r,p )  to  t,  is  the  tree 

t  r(t<®p). 

A  rewrite  sequence  is  a  sequence  of  rewrite  applications;  a  rewrite 
sequence  (rx, Pl)  •  •  •  (r„, pn)  is  applicable  to  t  if  (»i,pi)  is  applicable  to 
t  and  (r2>P2)  (rn,p„)  is  applicable  to  (rl,pl)(t).  A  rewrite  sequence  is 

valid  if  it  is  applicable  to  some  tree  t. 
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If  ever}'  rewrite  application  in  a  sequence  r  is  of  the  form  (r ,,  p// q,)  for 
some  position  p,  then  r@p  denotes  the  rewrite  sequence  (rj.q i)  •  •  •  (rn.  qn) . 

Let  t  and  <j>  be  valid  rewrite  sequences.  If  r(<)  =  t'  implies  <f>(t)  =  t' , 
then  4>  covers  r,  denoted  0  D  t. 

Let  t  be  a  valid  rewrite  sequence.  If  there  exists  a  rewrite  rule  r  such 
that,  for  all  trees  t,  ( r,s)(< )  =  f  iffr(t)  =  f,  then  r  is  the  composition  of 


T. 


Let  R  be  a  rewrite  system  over  O,  and  G  be  a  subset  of  To-  The 
SET-REACHABILITY  problem  is,  given  R.  G.  and  a  tree  t  over  O,  to  find 
a  rewrite  sequence  t  such  that  r(t)  £  G,  or  to  show  that  there  is  no  such 
sequence. 


In  order  to  solve  SET-REACHABILITY  efficiently,  we  constrain  the  num¬ 
ber  of  rewrite  sequences  that  must  be  considered.  If  we  can  show  that  some 
sequence  r  is  covered  by  a  different  sequence  6,  then  there  is  no  need  to  con¬ 
sider  r;  <f>  will  provide  an  adequate  solution  for  SET-REACHABILITY  u  hen- 
ever  r  would.  Our  first  “pruning”  of  the  set  of  all  rewrite  sequences  will  be 
to  consider  only  those  sequences  in  compositional  normal  form  (CNF). 

Definition  7  Let  r  be  a  valid  rewrite  sequence,  r  is  in  compositional 
normal  form  at  e  if  it  is  in  the  form  T\  •  •  •  rnr «  such  that 


(1)  for  1  <  i  <  n,  all  rewrite  applications  in  r,  have  positions  whose  head 
is  i,  and 

(2)  t.  =  (r\, p\)(r2,  P2)  •  •  •  {rmiPm)  where  (r^pi)  =  (a^  ft ,  e)  and,  for 
1  <  i  <  m,  (r1:pi)  ■  •  ■  (r,-, pt)  has  a  composition  a,  ^  fd,  with 

in  0. 


t  is  in  compositional  normal  form  (CRF)  if  it  is  in  CLF  at  e, 
{L2-P2]  ' ' '  (fm ! Pm)  is  CRF,  and,  for  all  1,  t;®?  is  in  CJ\F. 

Our  solution  to  SET-REACHABILITY  will  involve  a  single  bottom-up  traver¬ 
sal  of  the  tree  computing  all  of  the  “interesting”  rewrite  sequences  (those 
that  we  cannot  show  are  covered  by  others)  to  be  applied  at  each  position. 
Given  this  approach,  the  bottom-up  nature  of  the  CNF  (i.e.,  all  rewrites  at 
subtrees  are  done  before  rewrites  at  the  root)  is  necessary.  It  also  places  a 
strong  constraint  on  the  rewrite  systems  which  can  be  handled:  it  cannot 
be  possible  for  the  absence  of  one  rewrite  to  enable  a  second  rewrite  arbi¬ 
trarily  far  down  in  the  tree.  AAe  enforce  this  constraint  by  insisting  that 
the  rewrite  sequence  be  type- closed: 
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Definition  8  Two  patterns  gx  and  g2  are  similar,  written  gx  ~  g2,  if 
V(gx)  =  V{°2),  W(pi)  =  W(n2),  and  gx(p )  ±  g2(p)  implies  p  eW(ft). 

Two  rewrite  rules  rx  =  Oi  — '  3\  and  r2  =  q2  r  P 2  are  similar,  written 

rx  ~  r2,  if  u>i  =  ie2,  Qi  ~  oc2,  and  /?i  ~  02. 

A  rewrite  system  R  is  type-closed  if,  for  all  rules  r  =  a  0  and  trees 
t  with  r  applicable  to  t,  for  all  positions  p  <E  W(a),  for  all  rewrite  sequences 
t  in  R  applicable  to  i<&p,  there  exists  a  rule  r'  ~  r  such  that  r'  is  applicable 

to  t  r(tsp). 

Section  3.2  gives  the  reason  behind  the  name  “type-closed".  Given  a  type- 
closed  system,  we  need  only  consider  the  CNF  rewrite  sequences  in  order 
to  solve  SET-REACHABILITY: 

Proposition  4  Let  R  be  a  type-closed  re-write  system,  and  let  4>  be  a  valid 
rewrite  sequence  in  R.  There  exists  a  re-write  sequence  t  in  CIS F  that  covers 
d. 

In  order  to  prove  this  proposition,  we  need  some  lemmas. 

Lemma  4.1  Let  r  =  {ax  ^  0i,e)(a2  ^  Pi,p)  be  a  valid  rewrite  sequence. 
If  Pi(p)  6  O,  t  has  a  composition. 

Proof  It  is  straightforward  (but  tedious)  to  confirm  that  the  composition 
of  r  is  a  A  p  where 


a(p')  £ 

P(p')  = 

w(p')  = 


I  a2{qljs )  if  3  q,  s  9:p'  =  wx(p//q)//s 
\  Oi (p')  otherwise 


Pi  (q) 

Pi(p//w2(q)//s) 

P2  (q) 

Pi(p') 


wi  (p//^2(?)//-s) 

wi(p//qi)//q2 

tni(p') 


if  3  q3'.p'  =■  pl/q  and  d2(q)  <E  0 , 
if  3  qe  W(&)  and  s  eV  3:  p1  =  p//q//s  and 
p//w2(q)//s  e  T>(Pi), 
if  3  q  G  W (P2)  3:p'  =  p/fq  and 

p//u’2(?)  £  R(Pi), 

otherwise 

if  3  q  S  W(&)  and  s  <E  V  3-  p'  =  p//<lHs  and 
p//w2(q)//s  C  L>(Pi), 
if  3  q  €  W(/?2)  and  qx,q2  3- p'  =  pH q-, 

w2(q)  =  qi//q2,  and  p//qx  6  W(/?i), 

otherwise 


□ 


13 


The  definition  of  a  type-closed  rewrite  system  is  all  that  is  needed  to 
yield  the  following  lemma: 

Lemma  4.2 ■  Ifrl  =  a  (3  is  a  rewrite  rule  in  a  type-closed  system  R  and 
p  €  VV(jS),  then  for  all  rules  r2  £  R  and  positions  q,  there  exists  a  rule 
7 •'  ~  ra  such  that:  ( r2,w(p)//q){r[,e )  D  { ri,e)(r2,p//q ). 

Rewrites  in  non-intersecting  subtrees  can  be  exchanged,  since  their  re¬ 
quirements  and  effects  are  completely  independent. 

Lemma  4.3  Let  (r1,p1)(r2,p2)  be  a  valid  rewrite  sequence.  Up,  is  not  a 
prefix  of  p2  and  p2  is  not  a  prefix  of  pi ,  (r2 ,  p2 }  (t*i  >  Pi )  T  i  Pi )  (^"2  ?  P2 }  • 


We  can  now  prove  proposition  4: 


Proof  We  construct  a  series  of  rewrite  sequences  d°,  <f>\  •  •  • ,  <f>k ,  each  cov¬ 
ering  (p.  such  that 

(*)  <f?  is  of  the  form  $  •  •  •  dj>i  where  <f>)  only  has  rewrites  at  positions 
with  j  as  their  head. 


Define  d°  =  d-  6°  D  <f>,  and  it  satisfies  (*)  with  <f>°  —  e,  <t>°  —  <j>- 
Suppose  d1  is  not  in  CNF  at  e.  Let  (ri,pi)  •  •  •  (rm,pm)  be  the  applica¬ 
tions  in  <f>\.  Let  j  be  the  smallest  integer  such  that  (r^pi)  -jjJrj,Pj)  fails  to 

satisfy  condition  (2)  of  definition  7.  Let  4>'m  =  {r2,pi)  ’  ’  *  (LoPj)  ' ' '  (rm-Pm)- 
Due  to  lemma  4.1,  either  j  is  1  or  Pj  contains  some  wildcard  position  of 
the  composition  of  {ruPl)  •  •  •  (r;-i,P;-i>  as  a  prefix;  therefore,  according 
to  lemma  4.2,  there  exists  some  r'  ~  r;  and  position  p  such  that  (r;-,p)d» 


covers  thus  </>{■■■  #,(»*'•, p)dl  covers  <f>\ 

If  p  =  5,  let  =  4>)  and  di+1  =  (r',p)di-  Otherwise,  p  =  ty/s 

for  some  integer  /.  According  to  lemma  4.3,  dj  *  ■  *  P)  ’  ’  ’  y- 

d*  •  ■  •  dt,(r2,p}d:+1-  Let  dl+1  =  <*1  for  fc  #  /,  d!+1  =  and  <^+1  = 

Let  d,+1  =  <^i+1  •  •  •  <+Vi+1;  <^+1  D  d  and  satisfies  (*).  .  _ 

On  each  step  of  this  construction,  one  rewrite  application  in  <t>'m  either 
moves  from  a  non-empty  position  to  the  empty  position  or  is  moved  into 
some  d|+1-  Therefore,  the  construction  must  terminate  at  some  point  with 
dfc  in  CNF  at  e.  Recursively  applying  the  construction  to  the  d*  and  ihe 
tail  of  d£  produces  the  desired  r . 
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Recall  that  our  solution  for  SET-REACHABILITY  involves  a  bottom-up 
traversal  of  the  tree  that  computes  all  of  the  interesting  rewrite  sequences 
applicable  at  a  given  position.  In  a  CNF  rewrite  sequence,  these  sequences 
are  called  compositional  local  rewrite  sequences: 

Definition  9  Let  r  be  a  valid  CNF  rewrite  sequence.  If  t  =  Tj 
satisfying  the  conditions  in  Definition  7,  then  the  compositional  local 
rewrite  sequence  (LRS)  assigned  bvr  to  a  position  p  is  defined  byC{r,p), 
where 

(1)  C(r,e)  =  r«,  and 

(2)  C(t,  i// p)  =  C(T,m,p). 

Pelegri  has  devised  an  algorithm  for  computing  all  of  the  possible  local 
rewrite  sequences,  and  this  algorithm  is  easily  adapted  for  typed  rew  rite  sys¬ 
tems  and  compositional  local  rewrite  sequences.  Unfortunately,  sometimes 
there  are  an  infinite  number  of  sequences,  even  when  many  of  them  are 
unnecessary  for  solving  REACHABILITY.  We  therefore  further  constrain 
the  sequences  we  consider  by  insisting  that  they  be  efficient . 

Definition  10  Let  G  be  a  set  of  trees  and  let  r  be  a  rewrite  sequence  in 
CNF  of  the  form  ra  •  •  •  rnr.  satisfying  the  conditions  in  Definition  7.  r  is 
efficient  with  respect  to  G  if 

(1)  there  is  no  CNF  rewrite  sequence  r'm  shorter  than  r,  such  that,  for  all 
trees  t  with  rm(t)  €  G,  r.'(f)  G  G,  and 

(2)  each  r,  is  efficient  with  respect  to  {  tm  |  r.(t)  G  G  }. 

Proposition  5  Let  R  be  a  rewrite  system  in  A-normal  form  and  let  S  be 
a  subset  of  SA.  Let  G  =  UjeSL(»,  and  let  t  be  a  tree.  If  there  is  a  rewrite 
sequence  4>  G  R  with  4>(t')  G  G,  there  is  a  rewrite  sequence  t  G  R  efficient 
with  respect  to  G  with  r(t)  G  G. 

Proof  Let  r  =  ra  •  •  •  t„t.  be  a  minimal  length  rewnrite  sequence  in  CNF 
satisfying  the  conditions  in  Definition  /,  with  r(f)  G  G.  Such  a  sequence 
must  exist,  since  <j>  is  covered  by  some  CNF  sequence. 

Suppose  r  is  not  efficient  with  respect  to  G.  Then  either 

4 This  is  stricter  than  Pelegrfs  definition  of  “efficient”. 
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(1)  there  is  a  t[  shorter  than  r,  such  that,  for  all  trees  t  with  r.(t)  G  G. 
ri(f)  G  G,  or 

(2)  there  is  some  r,  that  is  not  efficient  with  respect  to  {  i'@i  |  T.(t')  G  G  }. 

(1)  is  impossible,  since  it  implies  that  T\  ■  ■  •  Tnrj(t)  G  G,  contradicting  the 
choice  of  r.  If  (2)  is  true,  then  there  is  some  shorter  sequence  t[  that  can 
replace  r,  and  still  result  in  a  valid  rewrite  sequence  r'  applicable  to  t. 
A{T’(i))  is  independent  of  t  and  r/;  it  only  depends  on  the  output  pattern 
of  r„  and  therefore  r'(f)  G  G,  again  contradicting  the  choice  of  r.  □ 

Therefore,  in  solving  SET-REACHABILITY,  if  the  goal  set  corresponds 
to  a  set  of  states  of  A ,  we  can  restrict  our  search  to  rewrite  sequences 
efficient  with  respect  to  the  goal  set  and  still  be  assured  of  finding  a  solution 
sequence  if  there  is  one. 

Given  a  set  of  local  rewrite  sequences,  Pelegri  showed  how  to  modify 
David  Chase’s  algorithm  for  pattern  matching  [1]  to  compute  all  possible 
rewrite  sequences  composed  from  the  local  rewrite  sequences.  e  end  this 
section  by  showing  how  to  compute  all  possible  rewrite  sequences  that  occur 
in  efficient  rewrite  sequences;  Pelegri  s  algorithms  are  then  used  to  construct 
the  SET-REACHABILITY  solver. 

Definition  11  Given  a  rewrite  system  R  in  A-normal  form  and  a  set  G  of 
goal  states  in  S a,  construct  the  following  sets: 

O0  =  {  L(s)[]  |  s  G  G  }  , 

Jo  =  0, 

U0  =  {e}, 

Oi+ 1  =  Oi  U  {  £><a j  |  Q  G  j  s  G)(g)  }  , 

Il+1  =  I,  U  {  £>  I  3r  G  Ui,  /31  G  Oi,  and  positions  p\,...,pn  3: 

r  has  composition  a  A  0, 

{pi,...,Pn}  =W((3)nV(n 
e  =  a  ^  )  ■  ■  •  ® -n)  P'(p»h  and 

1(0)  n  £(£')  #  0} 

U{  7  |  7  is  a  transition  leading  to  s, 
where  J(5)[]  G  0.  }, 
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j7.+i  _  ij.  u  {  T  =  (r,  f}r'  |  t'  is  in  CNF,  every  IRS  assigned  by  r' 

is  in  U„  and  t  is  efficient  with  respect  to 
L{o )  £ or  some  q  £  0,  }, 

0  =  U,0„ 

I  =  U,J„ 

U  -  U.Lv 

The  useful  local  rewrite  sequences  are  the  members  of  the  set  U,  and 
the  extended  pattern  set  of  (R,G)  is  the  union  of  I  and  O. 

Proposition  6  Let  R  be  a  rewrite  system  in  A-normal  form,  and  G  a  set 
of  goal  states  in  SA.  If  r  is  an  efficient  rewrite  sequence  with  respect  to 
L(w )  for  some  w  in  G,  the  local  rewrite  sequences  assigned  by  r  are  useful 
local  rewrite  sequences  with  respect  to  R  and  G. 

Proof  The  detailed  proof  is  quite  long;  we  sketch  the  main  ideas  here.  We 
will  show  that,  for  any  pattern  q  in  0,  any  rewrite  sequence  efficient  with 
respect  to  L{o)  is  composed  of  local  rewrites  sequences  in  U. 

We  proceed  by  induction  on  the  length  of  r.  Let  rp  be  the  local 
rewrite  sequence  assigned  at  p  by  t ,  and  define  the  sets  G c  —  L(w),  GP/.  — 
{tm  I  Tp(t)  e  G  p  } .  Since  r  is  efficient  with  respect  to  Gc,  tp  is  efficient 
with  respect  to  Gp.  It  can  be  shown  by  induction  that  Gp  is  equi\alent 
to  L(q)  for  some  output  pattern  o  G  0;  combining  this  with  the  fact  that 
local  rewrite  sequence  rp  is  composed  of  efficient  local  rewrite  sequences 
which  are  in  U  by  the  induction  hypothesis,  the  construction  of  U I+1  from 
U{  ensures  that  rp  is  also  in  U  .  ^ 

Note  that  the  construction  of  L  may  be  infinite.  CRSTNA  has  no  test 
for  this  possibility,  and  therefore  may  fail  to  terminate.  I  believe  that  such 
a  test  can  be  constructed  based  on  the  ideas  discussed  in  Section  2.3.  The 
extended  pattern  set  constructed  simultaneously  with  l  is  an  adequate 
replacement  for  the  extended  pattern  set  defined  by  Pelegri;  it  is  used  m 
the  table  construction  process. 

Optimality  of  the  rewrite  sequence  found  by  CRSTNA  has  been  lightly 
treated  in  this  section,  since  most  of  the  complication  arises  in  the  work 
done  by  Pelegri.  If  each  rule  has  an  associated  non-negative  cost,  and  the 
cost  of  a  rewrite  sequence  is  the  sum  of  the  costs  of  the  rules  in  the  sequence, 
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then  the  definition  of  efficient  can  be  modified  to  require  a  sequence  with 
lowest  cost,  rather  than  a  shortest  sequence;  combined  with  Pelegri  s  work, 
CRSTNA  then  finds  a  least  cost  sequence  leading  to  a  tree  in  the  goal  set. 


3.2  Internalizing  the  specification 

According  to  the  theory  in  the  previous  section,  the  wildcards  used  by  a 
rewrite  svstem  are  severely  restricted:  they  must  all  correspond  to  individ¬ 
ual  states  in  a  single  BFSA.  But  CRSTNA  s  specification  language  allous 
wildcards  to  be  arbitrary  members  of  RECOG.  In  this  section  we  show  how 
the  rewrite  systems  defined  previously  are  equivalent  in  power  to  those  writ¬ 
ten  in  CRSTNA's  specification  language,  and  show  the  motivation  for  the 
definition  of  a  type-closed  rewrite  system. 


Proposition  7  Let  7 Z  be  the  set  of  a 11  rewrite  systems  R  such  that  R  is  in 
A-normal  form  for  some  BFSA  A.  Let  1ZM  be  the  set  of  rewrite  systems  in 
which  every'  wildcard  is  in  RECOG. 

■JZ  is  as  powerful  as  R." ,  i.e.,  for  every  R  £  R-  there  is  a  corresponding 
R  e  71  such  that ,  for  every'  tree  t  and  rewrite  sequence  r*  £  Rm,  there  is  a 
rewrite  sequence  r  £  R  with  r(t )  —  T*(f). 


Proof  Let  R‘  £  Tlm,  and  A  be  a  BFSA  that  simultaneously  recognizes 
every  ■wildcard  in  Rm ,  i.e.,  a  wildcard  in  R  is  equal  to  U,L(s,)  for  some 
subset  {  si  . . .  sn  }  of  SA.  That  some  appropriate  A  exists  is  a  basic  result 
of  BFSA  theory. 

Let  r*  =  q*  f3m  be  a  rewrite  rule  in  R*.  We  will  construct  a  set  of 
rules  Rr.  such  that,  if  r*(f)  =  there  exists  a  rule  r  £  Rr-  with  r(t )  =  t’\ 
the  union  of  the  sets  Rr.  for  all  rewrite  rules  in  R~  forms  the  desired  R. 
Let  {pi, . . .  ,p„}  =  W(q“),  and  5,  be  the  set  of  states  corresponding  to 

a(p,).  Then 


Rr-  = 


a  =  a 


®pi 


-k(5i)0 


for  some  (si, . . . ,  s„)  £  5:  x 


^  L(sn)[] 
■■  x  Sn}. 


□ 


The  definition  of  a  type-closed  rewrite  system  arises  from  the  notion  of 
a  rewrite  system  with  recognizable  wildcards  in  which  every  type  is  closed 
under  the  system. 
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Definition  12  A  set  of  trees  T  is  closed  under  a  rewrite  system  R  if, 
for  all  trees  t  £  T,  for  all  rewrite  sequences  r  in  R  applicable  to  t,  r(t)  £  T. 

Proposition  8  Let  Rm  be  a  rewrite  system  in  which  every  wildcard  is  a 
recognizable  set  closed  under  Rm .  Let  R  be  the  rewrite  system  correspond¬ 
ing  to  Rm  according  to  Proposition  7.  R  is  type-closed. 

Thus,  given  a  rewrite  system  R~  whose  types  are  recognizable  sets  closed 
under  R’,  Proposition  7  shows  how  to  construct  a  type-closed  rewrite  sys¬ 
tem  equivalent  to  i?*,  allowing  us  to  use  the  theory  in  the  previous  section 
to  solve  SET-REACHABILITY. 

Unfortunately,  the  type  system  R  specified  by  the  user  in  terms  of  type- 
instances  mav  not  be  closed  under  the  rewrite  system.  CRSTNA  obtains 
a  closed  type  system  by  solving  SET-REACHABILITY  for  a  related  rewrite 
system  R'  defined  by  the  following  specification: 

•  Type  names  in  R  are  treated  as  nullary  operators  in  R' . 

•  For  each  type  instance  x  is-a  y,  R'  has  a  rule  x'  -*•  y[]  where  variables 
in  x  are  replaced  with  the  names  of  their  types  to  yield  x  . 

•  For  each  rule  x  —  y  in  R,  R'  has  a  rule  y'  ->  x'  where  variables  in  x 
and  y  are  replaced  with  the  names  of  their  types  to  j  ield  x  and  y  . 

•  R'  has  a  single  type  g ,  distinct  from  all  types  in  R. 

•  For  each  type  name  x  in  R,  R'  has  a  type  instance  x[]  is-a  g. 

For  example,  given  the  specification 

ident{]  is-a  expr 

l[]  is-a  expr 

+  [A~  :  expr\\,Y  :  expr[]]  is-a  expr 

+[X  :  expr[],  l[]]  -♦  addl  [A’  :  expr[}\ 

we  would  create  the  system 

ident[]  — ►  expr[) 

1[]  -»  expr[] 

+  [ expr [],  expr []]  — ►  expr[ ] 

addl[expr[ ]]  -»  +[expr  [],  l[]] 
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Now  if  t  is  a  type  name  in  R,  let  the  type  named  by  i  be  the  set  of  all 
trees  that  can  be  rewritten  into  t  by  R'\  in  our  example,  trees  with  addl  at 
the  root  and  expr  s  as  children  can  be  rewritten  into  ezpr,  in  addition  to 
those  originally  specified  with  type-instances. 

The  resulting  type  system  is  the  minimal  system  that  contains  the  origi¬ 
nal  tvpe-mstances  and  is  closed  under  R.  Determining  these  types  is  simply 
the  SET-REACHABILITY  problem  for  the  rewrite  system  PJ  with  goal  type 
g\  the  algorithms  in  this  report  can  solve  SET-REACHABILITY  for  any 
rewrite  svstem  without  variables,  and  therefore  can  solve  this  problem  for 
R'. 

3.3  Programming  details 

The  svstem  is  written  in  about  2000  lines  of  Common  Lisp.  It  is  a  relatively 
straightforward  implementation  of  the  theory  presented  in  this  paper,  ■v  ery 
little  optimization  was  done.  The  following  basic  design  choices  were  made. 

•  Sets  of  patterns  are  used  heavily*  in  Chase’s  algorithm;  the  most  com¬ 
mon  operations  on  them  are  intersection,  union,  and  equality*  testing. 
For  these  reasons,  an  ordered  set  representation  is  used,  with  patterns 
hashed  to  ensure  that  equal  patterns  are  represented  by  the  same  data 
object.  Profiling  suggests  that  this  was  a  good  choice. 

•  The  implementation  of  Definition  11  is  both  important  and  difficult, 
CRSTN  A  spends  most  of  its  time  in  this  construction.  Since  composi¬ 
tions  are  expensive  to  compute,  CRSTN  A  keeps  a  list  of  compositions 
that  may  eventually  satisfy*  the  condition  required  to  add  a  compo¬ 
sition  to  U .  It  is  not  clear  whether  or  not  this  was  a  good  choice; 
CRSTNA  has  severe  problems  with  space,  but  the  alternative  of  re¬ 
peatedly*  composing  rules,  discovering  that  they*  are  not  yet  valid, 
garbage  collecting  the  composition,  and  composing  the  rules  again 
does  not  sound  too  promising.  At  the  very*  least,  the  compositions 
that  are  formed  should  be  carefully  screened.  The  current  implemen¬ 
tation  forms  any  composition  that  is  in  bottom-up  normal  form  using 
useful  rewrite  sequences  that  have  already*  been  discovered,  this  is  ex¬ 
cessive,  since  in  many*  cases  the  Uuseful”  rewrite  sequences  are  useless 
in  the  particular  context. 
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•  The  automaton,  produced  by  solving  SET-REACHABILIT\  for  the 
rewrite  system  R'  described  in  the  previous  section  is  more  powerful 
than  is  necessary  to  determine  types;  it  contains  information  needed 
to  construct  a  rewrite  sequence  from  a  given  tree  to  its  type  name, 
while  all  that  is  needed  for  the  type  automaton  is  to  know  if  such 
a  sequence  exists.  Thus,  the  automaton  has  more  states  than  are 
strictly  required. 

This  is  disastrous;  the  primary  reason  CRSTNA  has  space  problems 
is  that,  given  a  rule  with  the  pattern  +[A  :  t[],T  :  ^[]]>  CRSTNA  has 
to  make  s2  copies  where  s  is  the  number  of  states  corresponding  to  t. 
For  rewrite  systems  the  size  of  machine  descriptions,  it  is  imperative 
to  minimize  the  type  automaton.  CRSTNA  has  this  capability,  and  it 
was  used  in  the  experiments  described  below.  A  representation  which 
optimized  the  amount  of  space  occupied  by  a  rule  at  the  expense  of 
time  manipulating  rules  might  also  be  a  win. 


4  Table  sizes  for  a  code  generator 

Tvpes  make  possible  the  specification  of  SET-REACH  ABIE  ITT ,  eliminating 
the  need  for  semantic  actions  in  many  applications.  They  also  give  the 
rewrite  system  designer  greater  control  over  when  rewrites  will  be  applied. 
The  cost  of  this  greater  control  is  larger  tables  that  must  simultaneously 

track  input  patterns  and  tree  types. 

In  order  to  see  if  the  table  sizes  were  likely  to  be  practical,  CRSTNA  was 
used  to  generate  tables  for  a  code  generator  for  the  Motorola  6S000.  The 
machine  description  was  written  from  scratch,  assuming  a  low-level  inter¬ 
mediate  representation  (e.g.,  using  machine  types  and  explicit  addressing 
calculations)  as  input.  Costs  were  not  used.  The  description  was  written 
in  about  a  day.  and  has  149  rewrite  rules  and  178  type  instances;  a  com¬ 
parable  machine  description  written  for  BURS  uses  520  rewrite  rules.  The 
ability  to  use  types  and  variables  is  therefore  significant  in  the  ease  of  writ¬ 
ing  the  machine  description.  Unfortunately,  CRSTNA  requires  more  than 
25  megabytes  to  process  this  description,  which  is  beyond  the  capability  of 

our  current  hardware  configuration. 

In  order  to  generate  at  least  some  tables,  the  rewrite  rules  describ¬ 
ing  register-to-register  moves  and  the  register-indexed-indirect  addressing 
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modes  were  removed  from  the  machine  description.  The  register-to-register 
move  instructions  greatly  increase  the  number  of  states  in  the  type  automa¬ 
ton  corresponding  to  a  single  type,  making  the  number  of  local  rewrite 
sequences  explode;  the  register-indexed-indirect  addressing  modes  involve 
large  patterns  with  three  operands,  all  of  which  may  be  rewritten  into  regis¬ 
ters.  Since  CRSTNA  generates  all  compositions  of  local  rewrite  sequences, 
this  is  a  deadly  combination;  it  might  be  possible  for  a  better  implementa¬ 
tion  (which  only  generated  and  saved  compositions  that  might  eventually 
become  useful)  to  handle  the  full  description  without  undue  amounts  of 
space. 

Even  so,  the  results  are  not  promising  for  machine  descriptions.  The 
smaller  description,  without  the  register  moves  and  register-indexed-indirect 
addressing  modes,  yields  a  table  with  2.964  states;  this  compares  with  362 
states  for  the  untyped  version.  A  simple  uncompacted  representation  of  the 
transition  tables  occupies  about  7,000  bytes;  Pelegri  does  not  give  uncom¬ 
pacted  transition  table  sizes,  but  his  compacted  transition  tables  occupy 
around  4,000  bytes. 

Most  of  the  table  size  is  taken  up  by  the  state  descriptions  in  Pelegri  s 
tables,  and  so  the  factor  of  ten  increase  in  the  number  of  states  (for  a  ma¬ 
chine  description  that  is  less  powerful)  seems  likely  to  make  typed  rewrite 
systems  unsatisfactory  for  code  generation.  In  addition,  the  increased  ta¬ 
ble  generation  time  makes  it  very  difficult  to  experiment  with  descriptions. 
CRSTNA  must  solve  an  untyped  system  just  to  get  the  type  automaton,  be¬ 
fore  moving  on  to  the  (slower)  typed  system,  so  this  gap  in  table-generation 
speeds  is  an  inherent  part  of  the  process. 

Thus,  the  experiments  suggest  that  CRSTNA  is  unsuitable  for  code 
generation.  The  extra  power  provided  by  types  makes  writing  the  machine 
description  more  convenient,  but  at  the  expense  of  a  large  increase  in  table 
size  and  in  table  generation  time.  Both  of  these  problems  may  be  solvable: 
the  former  may  either  evaporate  as  memories  get  larger,  or  might  be  solved 
bv  noting  similarities  among  the  states  and  storing  their  representations  in 
a  "compact  form,  while  the  latter  could  be  made  manageable  by  a  carefully 
optimized  implementation.  But  in  the  meantime,  untyped  systems  seem  to 
provide  a  better  combination  of  table  size  and  generation  speed,  with  an 
acceptable  amount  of  difficulty  in  the  description  writing. 
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5  Future  work 

The  main  principle  in  CRSTNA’s  design  has  been  to  accept  as  many  rewrite 
systems  as  possible  while  yielding  an  automaton  that  can  find  an  optimal 
rewrite  sequence  in  a  bottom-up  traversal  of  the  tree.  The  following  piob- 
lems  still  need  to  be  solved  before  leaving  this  general  area  of  design. 

•  Information  regarding  the  possible  input  trees  should  be  taken  into 
consideration,  so  that  unbounded  local  rewrite  sequences  that  only 
occur  for  impossible  input  trees  can  be  ignored. 

•  A  humanlv  understandable  characterization  of  the  rewrite  systems 
that  can  be  accepted  should  be  produced.  At  the  very  least,  a  decid¬ 
ability  test  should  be  found. 

•  SET-REACHABILITY  currently  states  that  a  rewrite  sequence  must  be 
found.  Although  this  is  useful  for  applications  that  still  need  to  attach 
semantic  actions  to  specific  rewrites,  one  of  our  goals  is  to  eliminate 
the  need  for  such  actions.  A  different  casting  of  SET-REACHABILITY 
that  only  requires  an  output  tree,  and  not  the  rewrite  sequence  pro¬ 
ducing  it,  may  allow  a  cleaner  theory  and/or  easier  solutions  to  some 
of  the  other  problems  in  this  section. 

•  Different  design  choices  in  the  Common  Lisp  implementation  need  to 
be  explored,  to  discover  if  the  problems  in  handling  machine  descrip¬ 
tions  are  inherent  in  the  method  or  simply  an  artifact  of  the  imple¬ 
mentation.  In  particular,  a  careful  implementation  of  Definition  11 
could  vastlv  increase  the  size  of  the  description  that  CRSTNA  could 
process  (although  it  would  have  no  effect  on  the  resulting  table  sizes). 


6  Conclusion 

It  is  possible,  in  principle,  to  solve  SET-REACH  ABILITY  very  efficiently, 
given  a  fixed  rewrite  system  and  set  of  goal  trees.  Unfortunately,  table 
sizes  and  table  generation  times  make  these  results  impractical  for  rewrite 
systems  the  size  of  machine  descriptions,  given  our  current  technology.  It 
is  unclear  whether  further  research  could  significantly  improve  either  table 
size  or  generation  time. 
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Our  11  "\v  algorithm  for  determining  the  efficient  local  rewrite  sequences 
greatly  increases  the  number  of  rewrite  systems  that  can  be  handled,  with 
or  without  types;  rewrites  that  increase  the  size  of  the  tree  may  be  ac¬ 
ceptable  in  some  cases,  and  any  rewrite  system  without  variables  can  be 
handled.  This  result  may  wind  up  being  the  most  important  contribution 
of  CRSTNA. 
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After  printing  of  the  technical  report,  an  error  was  discovered  in  the  proof  of 
Proposition  5;  in  fact,  the  proposition  is  false.  The  false  portion  of  the  result 
is  not  used  in  the  report;  this  insert  should  be  read  in  place  of  the  bound 
pages  15  and  16,  up  to  Definition  11  which  remains  unchanged. 


Recall  that  our  solution  for  SET-REACHABILITY  involves  a  bottom-up  traver¬ 
sal  of  the  tree  that  computes  all  of  the  interesting  rewrite  sequences  applicable 
at  a  given  position.  The  interesting  sequences  are  a  subset  of  the  local  rewrite 
sequences : 

Definition  9  Let  r  be  a  valid  CNF  rewrite  sequence.  If  r  =  r i  •  •  •  rnr» 
satisfying  the  conditions  in  Definition  7,  then  the  local  rewrite  sequence 
(LRS)  assigned  by  r  to  a  position  p  is  defined  by  C(r,p),  where 

(1)  C(r,  e)  =  r„,  and 

(2)  C(r,i//p)  =  C(rim,p). 

Given  a  set  U  of  local  rewrite  sequences,  Pelegri  has  shown  how  to  modify 
David  Chase’s  algorithm  for  pattern  matching  [1]  to  compute  all  possible 
rewrite  sequences  that  assign  only  local  rewrite  sequences  in  U.  Our  goal  is  to 
find  a  set  U  such  that  the  set  of  rewrite  sequences  assigning  LRS’s  in  U  covers 
the  set  of  rewrite  sequences  that  rewrite  trees  into  the  goal  set.  U  should 
be  as  small  as  possible;  in  particular,  it  should  be  finite.  Pelegri’s  algorithm 
finds  all  local  rewrite  sequences  that  do  not  loop  and  that  produce  a  subtree 
which  might  eventually  be  written  into  the  goal  tree;  unfortunately,  this 
can  still  involve  infinite  sets  of  local  rewrite  sequences  (e.g.,  sequences  that 
expand  and  then  contract  a  tree).  We  obtain  a  finite  set  by  only  considering 
the  set  of  sequences  that  are  efficient:1 

Definition  10  Let  G  be  a  set  of  trees  and  let  r  be  a  local  rewrite  sequence. 
t  is  efficient  with  respect  to  G  if  there  is  no  local  rewrite  sequence  t' 
shorter  than  r  such  that,  for  all  trees  t  with  r(t)  €  G,  r'{t )  €  G. 

If  an  LRS  is  interesting  only  because  it  rewrites  some  trees  into  a  partic¬ 
ular  goal  set  G,  we  can  ignore  it  if  it  is  not  efficient  with  respect  to  G;  some 
other  efficient  LRS  can  always  take  its  place.  CRSTNA  uses  the  following 
construction  of  U : 

1This  is  stricter  than  Pelegri’s  definition  of  “efficient”. 
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